Identification of novel prognostic risk signature of breast cancer based on ferroptosis-related genes

Ferroptosis is a type of cell regulated necrosis triggered by intracellular phospholipid peroxidation, which is more immunogenic than apoptosis. Therefore, genes controlling ferroptosis may be promising candidate biomarkers for tumor therapy. In this study, we investigate the function of genes associated with ferroptosis in breast cancer (BC) and systematically evaluate the relationship between ferroptosis-related gene expression and prognosis of BC patients from the Cancer Genome Atlas database. By using the consensus clustering method, 1203 breast cancer samples were clustered into two clearly divided subgroups based on the expression of 237 ferroptosis-related genes. Then differentially expressed analysis and least absolute shrinkage and selection operator were used to identify the prognosis-related genes. Furthermore, the genetic risk signature was constructed using the expression of prognosis-related genes. Our results showed that the genetic risk signature can identify patient subgroups with distinct prognosis in either training cohort or validation, and the genetic risk signature was associated with the tumor immune microenvironment. Finally, the Cox regression analysis indicated that our risk signature was an independent prognostic factor for BC patients and this signature was verified by the polymerase chain reaction and western blot. Within this study, we identified a novel prognostic classifier based on five ferroptosis-related genes which may provide a new reference for the treatment of BRCA patients.


Materials and methods
Data collection. Tumor samples used for the study were obtained from patients who underwent surgery at The First Affiliated Hospital of Zhengzhou University from December 2020 to January 2021 and had not received any prior anti-cancer treatment. Normal breast tissue samples obtained during surgery were used as control samples. Tissues were snap-frozen in liquid nitrogen until DNA/RNA isolation. The study was approved by the Ethics Committee of The First Affiliated Hospital of Zhengzhou University, and each patient's written informed consent was obtained. The patients' age and clinical conditions such as immunohistochemistry are shown in Supplementary Table 1.
The mRNA sequencing (RNA-seq) data, survival data, and corresponding clinical and molecular information of BC patients were downloaded from the Cancer Genome Atlas (TCGA, https:// portal. gdc. cancer. gov) database. In this study, specimens with no survival data were eliminated. In order to ensure the accuracy of the study, we removed the samples with no survival information and incomplete expression data. The "scale" function of R software was used to normalize the original data of gene expression. The 237 ferroptosis-related genes were retrieved from the FerrDb database and are listed in Supplementary Table 2.
To assess the classification performance of the top performing set of features or RNA transcripts, the performance was evaluated on the external validation dataset with accession GSE7390 obtained from the Gene Expression Omnibus (GEO) database. In order to validate the model on the external validation dataset, the external validation dataset was normalized as the training set. Identification of BC subtypes related to ferroptosis. Based on the expression of ferroptosis-related genes, we used unsupervised consensus clustering to identify BC subtypes related to ferroptosis. Cluster analysis was performed using the unsupervised machine learning algorithm K-Means clustering 26 by the "Consensus-ClusterPlus" package in R (http:// www.R-proje ct. org/). The following parameters were used: clustering method based on k-means, 1,000 iterations, and 80% of tumor samples sampled in each iteration. The best cluster number "k" was determined by the relative changes in the area under the cumulative distribution function (CDF) curve, the proportion of the fuzzy clustering 27 algorithm, and consensus matrix heatmaps. We used the median absolute deviation (MAD) method to explore and eliminate low-quality samples according to the expression of ferroptosis-related genes. To eliminate genes with low variability across patients, we kept genes with median absolute deviation higher than 0.5. After that, Cox analysis was performed using the "survival" package in R (http:// CRAN.R-proje ct. org/ packa ge= survi val) to evaluate the correlation between all candidate genes and overall survival (OS) rate. Validation of the risk signature in external dataset. To validate the prognostic value of the risk signature, an external validation dataset (GSE7390) with BC was downloaded from GEO (https:// www. ncbi. nlm. nih. gov/ geo/) database. Using the risk scoring system, risk score was calculated for each patient in the external validation. The patients in the validation were then classified into high-risk and low-risk, and the log-rank test was used to evaluate the difference in OS between high-risk and low-risk groups.

Correlation assessment between prognostic models and immune infiltrating cells. To ana-
lyze the relationship between the risk score and immune-cell characteristics, we used the CIBERSORT estimate (https:// ciber sort. stanf ord. edu/) software to quantify the immune cell fractions for the BC samples from TCGA. We evaluated the differences of the immune infiltration between high-and low-risk groups in TCGA BC dataset. Then, Pearson correlation analysis was used to analyze the correlation between immune cell infiltration and ferroptosis-related gene markers expression and risk score.
Prediction of drug response using risk signature. To determine the relationship between ferroptosisrelated genes and BC response rate to treatment drugs, we used risk scores to test the correlation between the prognostic model and the drug response rate. We analyzed the drug response of high-and low-risk groups and different drug response (CR: complete response, PR: partial response, PD: progressive disease, SD: stable disease) status samples. In addition, we drew the ROC curve of the risk score for the prediction of drug response and calculated the area under the ROC curve to further evaluate the application of risk scores in the prediction of drug response. Statistical analysis. Samples in TCGA and GEO validation cohorts were divided into high-and low-risk groups based on the median value of risk score. Student's t-test was used to compare the differences in the pathological and molecular characteristics of different groups. Univariate and multivariate Cox regression analyses were used to determine independent prognostic factors. Kaplan-Meier survival analysis and two-sided log-rank test were used to distinguish OS between stratified groups. The R software package "pROC" was used to draw ROC curves to analyze and predict the OS rate of BC patients. The Mann-Whitney test (P value adjusted by the BH method) was used to compare the infiltration of immune cells in the high-risk and low-risk groups. All statistical data were analyzed using the R software (version 4.0.2); P ≤ 0.05 was considered as threshold value for statistically significant.

Ethics approval and consent to participate. Approval from the Ethics Committee of First Affiliated
Hospital of Zhengzhou University(2020-KY-036-002). We can confirm that all of the methods were performed in accordance with the relevant guidelines and regulations.
Informed consent. Written informed consent was obtained from the patients/participants for publication of this article. The studies involving human participants were reviewed and approved by the Ethics Committee of the First Affiliated Hospital of Zhengzhou University.

Results
Classification of breast cancer based on ferroptosis-related genes. The overall workflow of our study is displayed in the Supplementary Fig. 1. Firstly we downloaded the RNA-seq data of 1,203 BC patients from TCGA database and 237 ferroptosis-related genes from the FerrDb database, and then clustered samples according to their ferroptosis-related gene expression difference. By using the "ConsensusClusterPlus" package in R (http:// www.R-proje ct. org/), the clustering number (K) was determined by the area under the cumulative distribution function (CDF) curve, which corresponds to the largest number of clusters that induced the smallest incremental change in the area under the CDF curves. When K = 2, the clustering result was relatively stable (Fig. 1A), and the area under the CDF curves was smallest (Fig. 1B). This showed that when the BC samples were clustered into two classes (Ferroptosis Cluster 1 and Ferroptosis Cluster 2), the difference of ferroptosis-related genes in the cluster was the most obvious. At the same time, the consistent heat map with other Ks (from 3 to 7) showed that the clustering boundary was fuzzy relative to the optimal number of clusters ( Supplementary  Fig. 2).
To explore the survival difference between two sample clusters, we drew the K-M survival curve based on the survival data of two clusters of samples (Fig. 1C). Survival analysis showed that the patients in the cluster C1 had longer OS (log-rank, P = 0.0045) than that in the cluster C2. This indicates that there is indeed a difference in the survival of the two different ferroptosis-related clusters. In addition, we also found that ferroptosis-related genes were differentially expressed in the two BC sample clusters (Fig. 1D), and the ferroptosis-related genes were up-regulated expression in the cluster C2. Characteristics features of these two different groups were showed in Supplementary Table 3. There were a total of 19 genes abnormally expressed in BC among these 237 ferroptosisrelated genes, including 4 up-regulated genes and 15 down-regulated genes ( Table 2).

Functional enrichment analysis of ferroptosis-related significantly different genes.
To elucidate the biological functions and pathways of the significantly differentially expressed genes (DEGs) in ferroptosis, the 19 genes were used in functional enrichment analysis. The results showed that 121 GO terms were enriched for biological processes (BP); six GO terms were enriched for cellular components (CC); and 43 GO terms were enriched for molecular functions 34 . Enriched BP terms included cellular oxidant detoxification (GO: 0098869), cellular response to toxic substance (GO: 0097237), cellular detoxification (GO: 1990748), and others. The GO results of the BP terms are shown in Fig. 2A, and the list of BP terms is provided in Supplementary  Table 4. Previous studies indicated that the activated oxidant detoxification underlies the protective mechanism of dedifferentiated transition and lineage propagation, which affects the proliferation of cancer cells 35 Table 6. The pathway enrichment analysis showed that twenty-seven KEGG pathways were significantly enriched (P < 0.05), including the TNF signaling pathway, cytosolic DNA-sensing pathway, viral protein interaction with cytokine, IL-17 signaling pathway, and others (Fig. 2D, Supplementary Table 7 www.nature.com/scientificreports/ tifunctional cytokine that plays important roles in diverse cellular events such as cell survival, proliferation, differentiation, and death, which may be involved in inflammation-associated carcinogenesis 36 . As an important part of inflammation and immune system, IL-17 signaling pathway is considered to be related to the occurrence and development of tumors 37 . For example, overexpression of IL-17 from gamma delta T cells and neutrophils conspired to promote breast cancer metastasis 38 . These findings indicated that the DEGs affect the prognosis of patients by influencing the biology pathways associated with cancer. Construction of prognostic gene signature related to breast cancer and ferroptosis. To further explore the prognostic potential of 19 ferroptosis-related DEGs, we performed univariate COX analysis on these 19 genes, and 16 DEGs were significant associated with patients' prognosis (P < 0.05) (Fig. 3A). In order to further consider whether these 16 factors can be used as independent prognostic factors, we divided the sample group into high and low expression groups according to the median expression of each gene, and then evaluated their prognostic differences using log-rank test. Then four genes were identified as independent prognostic factors The results showed that there were significant differences in the expression of these genes between BC samples and normal tissues, and they had an impact on the survival of patients. The results showed that the four genes have significant prognostic value, whereas other genes do not. However, considering the limited prognostic ability of a single gene, we further explored the prognostic ability of combined multiple genes. Then, a least absolute shrinkage and selection operator (LASSO) regression analysis was performed in the 19 DEGs to identify the most robust prognostic ferroptosis-related genes combination for BC (Fig. 4A). A total of five prognosis-associated ferroptosis-related genes were included in the LASSO Cox regression model, namely ALB, ANGPTL7, BLOC1S5-TXNDC5, IL6, and NGB (Fig. 4B). Compared with adjacent non-tumor tissues, BLOC1S5-TXNDC5 expression level was significantly higher in tumor tissue samples, (Fig. 4C) while the expression levels of the other four genes were significantly lower in tumor tissue samples (Fig. 4D-G). Then these five prognosis-associated ferroptosis-related genes were used to construct the prognostic signature.
Validation of the five-gene prognostic risk signature. According to the median value of the prognostic risk score, patients were divided into high-and low-risk groups (each n = 494). The K-M curve indicated that the OS of higher-risk patients was significantly worse than that of lower-risk patients (Fig. 5A). As shown in Fig. 5B, AUCs for 1-, 3-, and 5-year OS rates were respectively 0.536, 0.555, and 0.601, indicating that the risk signature could predict the 5-year survival rates for the BC patients better than the 1-and 3-year survival rates. Furthermore, we performed the K-M survival analysis for BC patients with different ages (AUC = 0.586) and genders (AUC = 0.5) (Fig. 5C). In addition, we also found that there were differences in the expression of 19 differentially expressed ferroptosis-related genes in the high-risk and low-risk groups (Fig. 5D); as the risk score increased, the patients' survival time decreased, and the death numbers increased (Fig. 5E).

Clinical experimental validation.
We performed PCR validation in clinical specimens following the steps  www.nature.com/scientificreports/ subtypes. We found that patients with positive ER, HER2, and PR status had lower risk scores than those with negative ER, HER2, and PR status (Fig. 7A-C). For patients of different genders, as shown in Fig. 7D, the risk score of male patients was lower than female patients (Wilcox test, P = 0.0063), indicating male patients may have better prognosis. In the analysis of BC subtypes, we found that patients with normal and basal subtypes had the highest risk signature (Fig. 7E). We also found that there were significant differences in the risk signature of patients with different methylation subtypes, the methylation subtype data of BC samples were obtained from the TCGA database, indicating that different methylation subtypes had different prognosis (Fig. 7F).
Immune cell enrichment analysis. To further explore the relationships between the risk scores and immune cells and functions, we employed CIBERSORT to compare the differential contents of 22 immune cells between high-and low-risk groups. In terms of immune cell infiltration, significant difference was found between risk score and dendrites, macrophage, mast cells, monocytes, neutrophils, NK cells, CD8 + T-cells, CD4 + T-cells, and Treg cells (Fig. 8A). As shown in Fig. 8B, Pearson correlation analysis was used to calculate the correlation coefficient between the infiltration of 22 immune cells and the five prognosis-associated ferroptosis-related genes. The results showed that the risk signature we established were related to tumor immune infiltration, highlighting that our risk signature may influence the prognosis of patients by influencing the tumor immune www.nature.com/scientificreports/ microenvironment invasion. For example, the expression of ANGPTL7 was significantly negatively correlated with the infiltration of M0 macrophages and significantly positively correlated with the degree of infiltration of CD4 + T-cells. In addition, we also performed correlation analysis between the risk assessment signature and the four immune infiltrating cells with the most significant differences in infiltration in the high-and low-risk groups. The analysis results showed that the risk score was significantly negatively correlated with the degree of infiltration of Treg cells and M0 macrophages (Fig. 8C,D), and it was significantly positively correlated with the degree of infiltration of CD4 + T-cells and M2 macrophages (Fig. 8E,F). Wang et al. found that the increasing M2 macrophage infiltration was associated with the tumor progression in esophageal squamous cell carcinoma 39 . This indicates that the infiltration of different immune cells is related to the survival of BC patients.
Predictive performance of risk score for drug response. To explore the performance of the ferroptosis-related risk signature, we constructed a ferroptosis-related risk signature in predicting the drug resistance of tumors, obtained clinical data of BC patients from TCGA, and analyzed the proportion of chemotherapy drug response and non-response in the high-and low-risk groups. We found that compared with patients in the high-risk group, patients in the low-risk group accounted for a higher proportion of drug responses (Fig. 9A) with better prognosis. It should be noted that the risk scores of patients in the complete response (CR) group were higher than those in the disease progression (PD) group (Fig. 9B). In clinical practice, some BCs with a very high proliferation index (i.e., highly malignant) often respond rapidly to cytotoxic drugs, thereby achieving a complete response (CR). Due to the existence of these cases, the risk score of patients in the complete response (CR) group increased. In addition, the ROC curve we drew showed that the predicted AUC of the risk score for the sample drug response was 0.644, which was higher than the random result (Fig. 9C). These analysis results indicate that the risk score we constructed has certain clinical guidance value in predicting and understanding drug response.  Fig. 10A, the survival probability of high-risk patients was significantly lower than that of low-risk patients (log-rank, P < 0.0001). As shown in Fig. 10B, the AUCs of ROCs for 1-, 3-, and 5-year OS rates were respectively 0.895, 0.694, and 0.536. Compared with gender and age as predictors, our risk model has higher predictive power (Fig. 10C). A nomogram was constructed to predict the possibility of 5-year and 10-year OS rates in BC patients by integrating the 5-FRG prognostic model and other clinicopathological characteristics (grade of the tumor and age of the patient). As shown in Fig. 10D, the nomogram demonstrated that the 5-FRG prognostic model was a valuable indicator for prognostic prediction.

Discussion
Ferroptosis is a form of cell death completely different from apoptosis, autophagy, and necrosis 7 . It has unique morphology, gene expression, and molecular pathways. Inhibition of glutathione, GPX4 activity, and irondependent active oxygen burst are the key factors that induce ferroptosis 40 . Small molecule drugs have been shown to promote ferroptosis, such as elastin and RSL3 41 . Therefore, ferroptosis inducers have the potential to treat tumors 14 . Because the mechanism of ferroptosis and chemical drugs-induced apoptosis is different, iron prolapse inducers may provide a new solution to the problem of tumor drug resistance 42 .
In this study, we found for the first time that the iron sensitivity of ferroptosis-related genes can divide BC patients into two categories, and the two categories of patients show significant differences in clinical and molecular characteristics. Gene markers related to ferroptosis were established. Through LASSO regression analysis, www.nature.com/scientificreports/ patients were divided into high-and low-risk groups. With a focus on genetic diversity, we established a signature based on five genes: protective genes (ALB, ANGPTL7, NGB, and IL6) and risk-related genes (BLOC1S5-TXNDC5). Thus, BC patients were divided into high-and low-risk groups to distinguish clinical outcomes. BLOC1S5-TXNDC5 is also known as MUTED-TXNDC5, and it is an example of a conjoined gene 43 . A CG is defined as a gene that produces a transcript by combining a portion of at least one exon of each of two or more different known (parent) genes located on the same chromosome, and it is usually (95%) translated independently into different proteins. Most CGs are closely related to cancer and can be used as molecular markers for clinical diagnosis and therapeutic targets, such as BCR-ABL in chronic myelogenous leukemia, ETV6-NTRK3 in secretory BC 4 , MYB-NFIB in adenoid cystic carcinoma 44 , and EML4-ALK in lung cancer 45,46 . It has been found that in addition to chromosome rearrangement, CGs can also be formed between different genes during transcription or post-transcription processing, such as trans-splicing, cis-splicing of adjacent genes (CIS-SAGE), and short homologous sequence (SHS)-mediated transcriptional sliding. A similar relationship was also observed between the expression of BLOC1S5-TXNDC5 and the parental genes 47 . Continuous cell proliferation and altered cell cycle are both markers of cancer 46 , and studies have shown that specific or highly expressed CIS-SAGE 48 in cancer cells  www.nature.com/scientificreports/ Angiopoietin-like (ANGPTL) protein is a secreted protein, similar in structure to members of the Angiopoietin family. Some angiopoietin-like proteins have pleiotropic activity and are involved in cancer lipids, glucose energy metabolism, and angiogenesis. ANGPTL7 is a low-characteristic member of this family that is considered to be related to oxidative stress and angiogenesis. Studies have reported that ANGPTL7 plays a role in promoting vascular endothelial injury and atherosclerosis 49 , and its role in tumors has gradually attracted attention. Studies have found that ANGPTL7 is over-expressed in colon cancer and less frequently so in ovarian cancer and expressed at a basal level in prostate cancer and lung cancer 50 . Its relationship with BC is still unclear. Neuroglobin (NGB) is a small intracellular monomeric globin that was first discovered in the neurons of the central and peripheral nervous system 51 . Over the years, a key neuroprotective effect has been attributed to the overexpression of NGB by neurons against several types of damage (such as hypoxia, oxidative stress, and hypoxia/glucose deficiency) [52][53][54][55] . In addition, recent results clearly indicate that high levels of intracellular NGB protein play a key role in BC cells' E2, estrogen receptor alpha dependence, and antioxidant and pro-survival effects 56,57 . NGB is the key intracellular mediator of E2 in ER α + BC cells 58 , and it is a component of the BC microenvironment. NGB can be released in the tumor microenvironment by BC cells under oxidative stress conditions where it can act as an autocrine/paracrine factor to communicate cell resilience against oxidative   www.nature.com/scientificreports/ stress and chemotherapeutic treatment 59 . Most of the samples we selected were ER + clinical specimens, and the level of NGB detected by PCR was consistent with the literature. Interleukin-6 (IL6) is a pro-inflammatory cytokine released by various cells in the tumor microenvironment, including cancer cells. IL6 levels in serum and tumor sites are elevated in several cancers, including BC 60 , and are usually accompanied by poor prognosis and low survival rates in BC patients. IL6 can affect all aspects of tumorigenesis by regulating proliferation, apoptosis, metabolism, survival, angiogenesis, and metastasis 61 . IL6 can also regulate tumor treatment resistance, such as multidrug resistance 62 . The cytokine IL6 and its downstream effector STAT3 constitute a key oncogenic pathway that has been thought to be functionally connected to estrogen receptor α (ER) in BC 27 . Albumin (ALB) is one of the best-characterized markers of hepatic progenitor cells and represents a novel biomarker for this neoplasm 63 . Apart from cholangiocarcinomas (ICCs) and hepatocellular carcinomas (HCCs), hepatoid pancreatic adenocarcinoma, breast invasive ductal carcinoma, yolk sac tumor, and acinar cell carcinoma also express albumin 64 . www.nature.com/scientificreports/ In short, a large number of previous studies have shown that these five genes are closely related to ferroptosis, which provides us with an important theoretical basis for constructing a risk model based on ferroptosis-related genes. Moreover, with the inclusion of some clinical and molecular features, we demonstrated that the risk score of ferroptosis-related genes is an independent prognostic indicator of OS for patients with BC. The risk score established with TCGA shows significant clinical differences between the two risk groups and can independently predict the prognosis of BC. The analysis results show that the risk score feature is a powerful prognostic indicator that can be used to classify patients and guide future treatment. After that, we built a nomogram to establish an individualized prognostic prediction model, in which the individual's risk in the clinical environment was quantified by integrating multiple risk factors. In addition, not only did the calibration curve show a high degree of agreement between the actual survival rate and the predicted survival rate, but the prognostic-related genes we screened in the PCR detection of clinical specimens also generated results that were consistent with our predictions. The GO BP mainly comprised response to oxidative stress and cellular response to oxidative stress. The enriched KEGG pathways were ferroptosis and fatty acid biosynthesis. As observed, our study strengths include the systematic expression profile analysis, robustness of risk scoring method, and the validation across multiple platforms among multiple populations. Despite the confirmation of the predictive value of the five gene signatures in various datasets, larger-sample prospective studies are still needed to assess the clinical relevance. In addition, compared with ferroptosis, some genes in the signature may be more strongly related to other pathways in BC. In summary, our results demonstrate that the five gene markers may be potential prognostic biomarkers, providing new insight into the research and treatment of BC.

Conclusions
Our study identified a new five-gene diagnostic signature associated with ferroptosis that can be used to predict prognosis of BC. This diagnostic signature can accurately predict the level of BC risk. It was worth noting that we had verified the reliability and applicability of this feature not only by applying it to a separate cohort but also by using PCR in detection of mRNA in our clinical tissue samples and by using western blot analysis in detection of protein in BC cell lines. In addition, the degree of invasion of immune microenvironment plays an important role in the prognosis of new genetic traits, which is helpful to find new diagnosis and treatment methods of BC.